Skip to content

Add inference backend A/B benchmark#2777

Draft
RitwijParmar wants to merge 1 commit into
PrimeIntellect-ai:mainfrom
RitwijParmar:feat/dynamo-rollout-benchmark
Draft

Add inference backend A/B benchmark#2777
RitwijParmar wants to merge 1 commit into
PrimeIntellect-ai:mainfrom
RitwijParmar:feat/dynamo-rollout-benchmark

Conversation

@RitwijParmar

@RitwijParmar RitwijParmar commented Jun 11, 2026

Copy link
Copy Markdown

Summary

  • add an endpoint-level benchmark framework for OpenAI-compatible inference backends such as vLLM, routers, and Dynamo experiments
  • support both one-off comparisons and JSON scenario-suite runs
  • include a ready-to-run four-profile suite for short-rollout latency, long-context prefill, high-concurrency decode, and session-cache reuse
  • measure warmup-excluded request throughput, output throughput, streaming TTFT, p50/p95/p99 latency, error rate, and per-request failures
  • snapshot /metrics before and after each backend with a self-contained vLLM counter parser for token deltas, prefix-cache hit rate, and NIXL failure counters
  • add optional per-scenario regression gates so backend experiments can fail on throughput, latency, or error-rate regressions before being wired into full RL runs
  • write aggregate Markdown reports and JSON samples so backend comparisons can be reviewed and debugged after the run

Related to #1166. This does not duplicate the Dynamo backend implementation. It gives the project a repeatable way to compare Dynamo, vLLM, or router endpoints under rollout-like traffic before moving a backend into training.

Checks

  • uv run --no-sync ruff check benchmarks/scripts/inference_backend_benchmark.py tests/unit/test_inference_backend_benchmark.py
  • uv run --no-sync python -m py_compile benchmarks/scripts/inference_backend_benchmark.py tests/unit/test_inference_backend_benchmark.py
  • PYTHONPATH=src uv run --no-sync pytest --confcutdir=tests/unit tests/unit/test_inference_backend_benchmark.py -q
  • git diff --check

Note: regular uv run sync is blocked locally on macOS because this checkout's lockfile only supports Linux platforms. The repo-level pytest conftest also pulls in heavier runtime setup that is unrelated to this pure benchmark module, so I ran the focused tests with --confcutdir=tests/unit after populating submodules and installing the minimal local test tooling.

@RitwijParmar RitwijParmar force-pushed the feat/dynamo-rollout-benchmark branch from 9100ad5 to 58d62e4 Compare June 11, 2026 21:45
@RitwijParmar RitwijParmar force-pushed the feat/dynamo-rollout-benchmark branch from 58d62e4 to 0beec65 Compare June 11, 2026 21:57
@RitwijParmar

Copy link
Copy Markdown
Author

I opened this as a draft because the next useful step is a real vLLM vs Dynamo run.

If there is a preferred Dynamo branch or launch command, I can run the suite against it and add the result artifact here.

The suite covers short rollout latency, long-context prefill, high-concurrency decode, and session-cache reuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant